fix(responses): split durable previous_response_id continuity from #211#270
fix(responses): split durable previous_response_id continuity from #211#270yigitkonur wants to merge 1 commit intoSoju06:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces SQLite-backed durable continuity for Responses by persisting completed response “snapshots” and using them to replay previous_response_id across HTTP and WebSocket flows (including restart/bridge-loss scenarios), while also adding a narrow WebSocket retry for early upstream disconnects.
Changes:
- Add durable
response_snapshotsstorage (model + migration + repository) and persist completed Responses into it. - Resolve
previous_response_idby replaying caller-scoped snapshot chains and prefer the originating account when selecting an upstream account. - Retry one in-flight WebSocket request when upstream disconnects before
response.created, and update integration/spec coverage accordingly.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
app/modules/proxy/service.py |
Implements snapshot replay for previous_response_id, snapshot persistence on completion, and early-disconnect WebSocket retry + preferred-account routing. |
app/modules/proxy/response_snapshots_repository.py |
Adds repository for snapshot get/upsert across SQLite/PostgreSQL dialects. |
app/modules/proxy/repo_bundle.py |
Wires response_snapshots into the proxy repository bundle. |
app/modules/proxy/load_balancer.py |
Adds preferred_account_id handling and supports excluding accounts (used by retry + replay preference). |
app/dependencies.py |
Provides ResponseSnapshotsRepository in the default repo factory. |
app/db/models.py |
Adds ResponseSnapshot ORM model and continuity index. |
app/db/alembic/versions/20260327_000000_add_response_snapshots.py |
Migration creating/repairing response_snapshots table and index. |
tests/integration/test_openai_compat_features.py |
Adds restart + API-key scoping coverage for HTTP /v1/responses replay behavior. |
tests/integration/test_proxy_websocket_responses.py |
Adds WebSocket replay + scoping tests and early-disconnect retry test. |
tests/integration/test_http_responses_bridge.py |
Updates HTTP bridge behavior to replay from snapshots when continuity is lost. |
tests/integration/test_migrations.py |
Verifies migrations create response_snapshots and repository scoping/created_at behavior. |
tests/integration/test_load_balancer_integration.py |
Adds coverage for preferred-account selection and fallback. |
app/core/openai/requests.py |
Exposes sanitize_input_items() for reuse in replay normalization. |
openspec/... files |
Updates specs/change artifacts for durable snapshot replay + early-disconnect retry semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| resolved_request = await self._resolve_previous_response_request( | ||
| responses_payload, | ||
| api_key_id=refreshed_api_key.id if refreshed_api_key else None, | ||
| ) | ||
| responses_payload = resolved_request.payload |
There was a problem hiding this comment.
_prepare_websocket_response_create_request() always calls _resolve_previous_response_request(), which strips previous_response_id and rebuilds input from persisted snapshots. This means WebSocket requests never “prefer live upstream continuity” (forwarding previous_response_id to upstream) even when a live upstream session exists, which conflicts with the PR description/spec updates. Consider only replaying from snapshots when live continuity is unavailable (or trying upstream first and falling back to snapshot replay on previous_response_not_found).
| resolved_request = await self._resolve_previous_response_request( | ||
| payload, | ||
| api_key_id=api_key.id if api_key else None, | ||
| ) | ||
| payload = resolved_request.payload |
There was a problem hiding this comment.
_stream_with_retry() now resolves any non-empty previous_response_id via persisted snapshots and raises invalid_request_error when no snapshot exists. This will break existing integration expectations that previous_response_id is forwarded unchanged when the HTTP bridge is disabled (e.g., tests in test_proxy_sticky_sessions.py and WebSocket forwarding tests). If the new contract is intentional, those tests should be updated/removed; otherwise, consider allowing pass-through to upstream when snapshots are missing but live continuity might still exist.
| resolved_request = await self._resolve_previous_response_request( | |
| payload, | |
| api_key_id=api_key.id if api_key else None, | |
| ) | |
| payload = resolved_request.payload | |
| original_payload = payload | |
| try: | |
| resolved_request = await self._resolve_previous_response_request( | |
| payload, | |
| api_key_id=api_key.id if api_key else None, | |
| ) | |
| payload = resolved_request.payload | |
| except AppError: | |
| # If resolving the previous_response_id via snapshots fails (e.g., no snapshot | |
| # exists), fall back to the original payload so that previous_response_id is | |
| # forwarded unchanged to the upstream, preserving existing integration behavior. | |
| payload = original_payload |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6355223279
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if retried is not None: | ||
| account, upstream = retried | ||
| account_id_value = account.id |
There was a problem hiding this comment.
Synchronize retried upstream with the main websocket loop
When a pre-response.created disconnect is retried, this code updates account/upstream only inside _relay_upstream_websocket_messages. The outer proxy_responses_websocket loop still retains the original disconnected socket, so a later client message on the same downstream websocket is sent to stale upstream state and can fail with stream_incomplete even though the retry succeeded. This breaks continued multi-turn websocket use after transient upstream EOFs; the retried connection needs to be propagated back to shared state used by the sender loop.
Useful? React with 👍 / 👎.
Summary
response_idand replayprevious_response_idacross HTTP and WebSocket Responses trafficapi_key_id, prefer the originating account when replaying, and retry one websocket request after an early upstream disconnect beforeresponse.createdNotes
mainTesting
rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_migrations.py -qrtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_load_balancer_integration.py -qrtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_openai_compat_features.py -qrtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_proxy_websocket_responses.py -q -k "replays_previous_response_after_restart or previous_response_id_is_scoped_to_api_key or retries_once_after_upstream_eof_before_response_created"rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/pytest tests/integration/test_http_responses_bridge.py -q -k "replays_previous_response_after_bridge_loss or replays_previous_response_without_opening_fresh_session or send_failure_replays_previous_response_from_snapshot or precreated_disconnect_replays_previous_response_from_snapshot"rtk /Users/yigitkonur/dev/codex-lb/.venv/bin/python -m compileall app testsrtk proxy openspec validate --specs